Layout and Language: Lists and Tables in Technical Documents
نویسندگان
چکیده
In this paper, we describe some of the interactions between layout and language we have been dealing with in recent applied NLP projects. We present two complementary views of lists and tables, intended to bridge the gap between considering them as a type of running text (which linguistics knows how to deal with) and as a multi-dimensional relation represented in two dimensions, which may have many reading-paths (which linguistics doesn't know how to deal with). Stated or inferred linguistic and world knowledge in the text surrounding tables and lists provides a context for the interpretation of a set of tuples extracted from tables or lists together with heuristics about how multi-dimensional information is projected on to two dimensions.
منابع مشابه
Layout and Language: A Corpus of Documents Containing Tables
This paper describes the collection of a corpus of documents that contain one or more tables. Some results are then presented which go some way to characterising the table in terms of its relationship to the content of the document it appears in.
متن کاملExtensible layout in functional documents
Highly customised variable-data documents make automatic layout of the resulting publication hard. Architectures for defining and processing such documents can benefit if the repertoire of layout methods available can be extended smoothly and easily to accommodate new styles of customisation. The Document Description Framework incorporates a model for declarative document layout and processing ...
متن کاملTable recognition in mathematical documents
While a number of techniques have been developed for table recognition in ordinary text documents, when dealing with tables in mathematical documents these techniques are often ineffective as tables containing mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysi...
متن کاملAn Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike prev...
متن کاملSummary of Decision Analysis Applications in the Operations Research Literature, 1990 2001
1. INTRODUCTION This technical report summarizes applications of decision analysis that appeared in major English language operations research (OR) journals and other closely related journals from 1990 through 2001. The primary purpose of this report is to provide backup information for Keefer et al. (2004), which discusses trends in decision analysis applications. While this technical report s...
متن کامل